Occupancy Modeling

Kim Rivera & Travis Gallo

Who we are…

Kimberly Rivera

krivera@lpzoo.org

UWIN Research Coordinator, Lincoln Park Zoo Chicago


Research interests:

  • Passive sampling to study wildlife ecology and behavior

  • Disentangling human-wildlife interactions

  • Engaging communities in ecology

Who we are…

Travis Gallo

tgallo@umd.edu

Assistant Professor, University of Maryland


Research interests:

  • Urban biodiversity

  • Quantitative ecology

  • Nature equity

  • Animal movement (new)

Occupancy studies

We will get a little into the nuts and bolts, but generally with occupancy models we are modeling the occurrence of a “thing”.


The general approach is to treat presence and absence as a Bernoulli random variable governed by the “success probability” \(\psi\). Effects of covariates on \(\psi\) can be modeled using a link function (logit in this case)


We are taking the liberty of assuming that you have some knowledge of generalized linear models. But if you want to learn more, I am sure Mason can put together some regression workshops… ::winking-at-mason::

Occupancy studies

  • Occupancy studies been around for a long time, the modeling framework we are talking about was introduced by MacKenzie et al. 2002

  • This framework requires data that is less intensive than those for abundance estimation

  • But the rub is that they do require multiple site visits

Why study occupancy?

  • It’s great for extensive and large-scale wildlife monitoring programs

  • Can be used to model the distribution of species (e.g., range shifts)

  • Can let you know about habitat selection

  • Estimate species richness

  • Meta-population dynamics

  • Connectivity

Vasudev et al. 2021. Biol Cons.

Occupancy

  • Abundance is often the variable of choice when analyzing a population. However, it can be very difficult and time intensive to collect abundance data across large scales


  • A complimentary variable, occupancy, is often used instead.


  • Occupancy is the probability that a site is occupied by a target species. In other words \(Pr({\text{abundance}>0})\)


  • We use cameras (for the most part) to learn if a species is present or absent form a site

Our data is not presence/absence

Detection / non-detection

Think about our data as detection / non-detection, not presence / absence

Presence data arise from a two part process and tell us that:

  1. The species occurs at a site AND

  2. The species is detected by an investigator at a site

Absence tell us that:

  1. The species does not occur at a site OR

  2. The species was not detected by an investigator

Therefore, we have measurement error. If you do not correct for imperfect detection you cannot separate false absence from true absences

For the sake of this class, we will assume the probability of a false positive = 0

Ecological vs. Observational Process

We need to partition variability between the underlying ecological process we are interested in and the observational process of data collection


Need to estimate detection probability (p)

To tease this apart we need detection/nondetection data collected repeatedly over time, and we we need to construct detection histories.


For occupancy, the detection histories indicate the success or failure of sighting or detecting the “thing” at a location For example, with 5 sampling occasions: 01010. The “thing” was observed at the second and fourth occasions, but not the first, third, and fifth.


We construct these detection histories with Repeated Surveys

Detection histories

Repeat visits allow us to separate the ecological and observational process

  • \(\large{\psi_i:}\) Pr(site \(i\) is occupied)
  • \(\large{p_{ij}:}\) Pr(observing species at site \(i\) on survey \(j\) given they are present)
  • If the species is not present at site \(i\), Pr(\(p_{ij}) = 0\) for all \(j\) at that site.
  • If the species is present at site \(i\), then it can be detected at Pr(\(p_{ij}\)).

Detection histories to probability statements

Our two parameters, \(\psi\) and \(p\), can be combined in probability statements to describe the probability of having specific encounter histories. Here are two examples, with associated probability statements:

Detection History Probability of having this history
01010 \(\psi(1-p_1)p_2(1-p_3)p_4(1-p_5)\)
00000 \((1–\psi) + \psi(1-p_1)(1-p_2)(1-p_3)(1-p_4)(1-p_5)\)


\(L(\psi,p|H_1...H_S) = \prod (H_i)\)

Challenge

Write out a probability statement for a detection history of 11001

\(\psi\rho_1\rho_2(1-\rho_3)(1-\rho_4)\rho_5\)

Occupancy studies require replication

  • Spatial replication: You randomly select ‘sites’ to monitor within a region of interest

  • Temporal replication: You revisit those ‘sites’ across a window in which you assume the occupancy status does not change (closure).

Main assumptions

  1. Closure

  2. No false positives

  3. Independence of occurrence and detection: occupancy status at one site should be independent of the occupancy status at another site (except for what we can explain with covariates). Detection probability should also be independent across repeated visits.

  4. Homogeneity of detection process: unexplained variation in detection can cause underestimates in occupancy (use covariates)

  5. Parametric assumptions: we assume that our statistical models (typically with covariates) are a reasonable abstraction of reality.

Deriving an occupancy model

Let’s use first principles and ask ourselves 3 questions:

Question 1

What is the customary statistical description of a presence/absent state (\(z_i\))

A Bernoulli distribution would be the natural answer, thus:

\[z_i \sim Bernoulli(\psi)\]

where, \(\psi\) is the expected occupancy probability

Deriving an occupancy model

Question 2

I think most people can see that a “thing” that we are looking for could be potentially missed… So, there is error in our presence/absence measurement (\(y\))


Sometimes we may have \(y = 0\) at a site where \(z = 1\). If we visited a site twice we could get [0,0], [0,1], [1,0], [1,1].


So, our question…What is a sensible statistical model for the estimating variability in detection/non-detection measurements?

Again, a Bernoulli distribution would be the natural answer:

\[(y_i|z_i = 1) \sim Bernoulli(p)\]

Where \(p\) is the detection probability

Deriving an occupancy model

Questions 3

What about an absent site (\(z = 0\))? What are the possible presence/absence measurements and what statistical model might be use for this process?

We could again use a Bernoulli process \((y_i|z_i = 0) \sim Bernoulli(q)\), where \(q\) is the probability of a false-positive.

But we will assume that we have no false-positives for today. There are model extensions to account for false positives.

Deriving an occupancy model

If we put these questions together, we get the basis for a site occupancy model:

  • State Process: \(z_i \sim Bernoulli(\psi)\)

  • Observation Process: \(y_{ij}|z_i \sim Bernoulli(z_i \cdot p)\)

Where the latent variable \(z_i\) is the true state of occurrence at site \(i\) and \(\psi\) is the expected value of \(z\) also called the occupancy probability.

The observed variable \(y_{ij}\) is the measurement of occurrence at site \(i\) during survey \(j\) and is conditional on \(z_i\).

\(p\) is the detection probability of the “thing” at site \(i\) during survey \(j\). Note that the detection probability refers to all individuals together and not the probability of detecting individuals.

Deriving an occupancy model

  • State Process: \(z_i \sim Bernoulli(\psi)\)

  • Observation Process: \(y_{ij}|z_i \sim Bernoulli(z_i \cdot p)\)

The observation process is conditional on the state process, because the parameters of the Bernoulli distribution is the product of \(z_i\) and \(p\).

At unoccupied sites the product is 0 and only 0 observations can be made. If it occurs, then it can be observed at the detection probability \(p\).

Deriving an occupancy model

  • State Process: \(z_i \sim Bernoulli(\psi)\)

  • Observation Process: \(y_{ij}|z_i \sim Bernoulli(z_i \cdot p)\)

So a site occupancy model is just a mixture of two logistic regression models.

Deriving an occupancy model

Since these are probabilities, we can use the logit-link to fit linear predictors and test varying hypotheses.

\[ \begin{aligned} \text{logit}(\psi_i) &= \beta_0 + \beta_{forest} \times x_{forest} + \beta_{elev} \times x_{elev}\\ \text{logit}(p_{ij}) &= \alpha_0 + \alpha_{hour} \times x_{hour} \end{aligned} \] This is what we will do next with the unmarked package.

Fitting Occupancy models in R

You can fit occupancy models in R with the unmarked package

install.packages("unmarked")
library(unmarked)

unmarked has a lot of functions. Luckily, it has a vignette that shows you how to get started with a package.

You can open the unmarked vignette with:

vignette("unmarked")

You will also find a very helpful online community if you just Google it (e.g., ‘unmarked R’)

Working with unmarked

unmarked requires a special class of object to fit an occupancy model

They are called an unmarkedFrame. For occupancy analyses, an unmarkedFrame contains:

  1. y - A matrix of detection, non-detection data where the rows are sites and the columns are sampling periods. IF a site was not sampled on a given survey, it should have an NA in that cell

  2. SiteCovs - a data.frame of covariates that vary at the site level. This should have as many rows as the detection matrix and one column per covariate. This should also be in the same row order as the detection matrix.

  3. ObsCovs - a list of length K where K is the number of observation covariates. Each element of this list is a data frame representing a covariate with the same dimensions and order as the detection non-detection matrix.

my_occu <- unmarkedFrameOccu(y, SiteCovs, ObsCovs)